Principal Component Analysis
PCA (Principal Component Analysis) is a technique that finds the directions of maximum variance in your data, then projects the data onto those directions — reducing dimensions while keeping as much information as possible.
1. Mean centering
\[\bar{\mathbf{x}} = \frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_i\]
\[\tilde{\mathbf{x}}_i = \mathbf{x}_i - \bar{\mathbf{x}}\]
2. Covariance matrix
\[\mathbf{C} = \frac{1}{n-1} \sum_{i=1}^{n} \tilde{\mathbf{x}}_i \tilde{\mathbf{x}}_i^\top = \frac{1}{n-1} \tilde{\mathbf{X}}^\top \tilde{\mathbf{X}}\]
where \(\tilde{\mathbf{X}} \in \mathbb{R}^{n \times d}\) is the centered data matrix.
3. Eigendecomposition
\[\mathbf{C} \mathbf{v}_k = \lambda_k \mathbf{v}_k, \quad k = 1, \dots, d\]
with the eigenvectors ordered such that \(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_d \geq 0\).
4. Principal components matrix
\[\mathbf{V}_K = \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_K \end{bmatrix} \in \mathbb{R}^{d \times K}\]
5. Projection (encoding)
\[\mathbf{Z} = \tilde{\mathbf{X}} \mathbf{V}_K \in \mathbb{R}^{n \times K}\]
Each row \(\mathbf{z}_i = \mathbf{V}_K^\top \tilde{\mathbf{x}}_i\) is the low-dimensional representation of point \(i\).
6. Reconstruction (decoding)
\[\hat{\mathbf{X}} = \mathbf{Z} \mathbf{V}_K^\top + \bar{\mathbf{x}} = \tilde{\mathbf{X}} \mathbf{V}_K \mathbf{V}_K^\top + \bar{\mathbf{x}}\]
7. Reconstruction error
\[\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} \left\| \mathbf{x}_i - \hat{\mathbf{x}}_i \right\|^2 = \sum_{k=K+1}^{d} \lambda_k\]
The discarded eigenvalues exactly equal the mean squared reconstruction error.
8. Variance explained
\[\text{VE}(K) = \frac{\sum_{k=1}^{K} \lambda_k}{\sum_{k=1}^{d} \lambda_k}\]
9. SVD equivalence
PCA can be computed directly via the SVD of the centered data matrix, avoiding explicit covariance computation:
\[\tilde{\mathbf{X}} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^\top\]
The principal directions are the right singular vectors \(\mathbf{V}\), and the eigenvalues relate to singular values by:
\[\lambda_k = \frac{\sigma_k^2}{n - 1}\]
The scores are then \(\mathbf{Z} = \mathbf{U}_K \mathbf{\Sigma}_K\).
Question:
What does direction of maximum variance mean?
Answer:
Variance is a measure of how spread out numbers are. When we talk about a direction in 2D or 3D space, the variance in that direction is how spread out the data points are when you squash them onto a line pointing that way.Imagine shining a flashlight on the data cloud from different angles and measuring how long the shadow is. The direction that casts the longest shadow is the direction of maximum variance.
More precisely: for a unit vector \(\mathbf{v}\), the variance of the data projected onto it is \(\mathbf{v}^\top \mathbf{C}\, \mathbf{v}\). PCA finds the \(\mathbf{v}\) that maximises this.Drag the angle slider and notice two things:
The shadow widens and narrows. The right panel shows the 1D “shadow” of the data onto the chosen line. A wide, spread-out shadow means high variance in that direction. A narrow, squished shadow means low variance.
The bar fills toward green. The fill bar shows what fraction of the maximum possible variance you’re capturing. It hits 100% exactly at the direction PCA would find — the eigenvector of the covariance matrix.
Formally, the variance along unit vector \(\mathbf{v}\) is:
\[\text{Var}(\mathbf{v}) = \mathbf{v}^\top \mathbf{C}\, \mathbf{v}\]
PCA solves \(\max_{\|\mathbf{v}\|=1} \mathbf{v}^\top \mathbf{C}\, \mathbf{v}\), and the solution is the eigenvector with the largest eigenvalue. That eigenvalue \(\lambda_1\) is exactly the maximum variance — which is why eigenvalues appear in the variance-explained formula.
Question:
What is the formula for variance for any arbitrary vector which data is projected to?
Answer:
For a unit vector \(\mathbf{v} \in \mathbb{R}^d\) with \(\|\mathbf{v}\| = 1\), the variance of the centered data projected onto it is:
\[\text{Var}(\mathbf{v}) = \mathbf{v}^\top \mathbf{C}\, \mathbf{v}\]
where \(\mathbf{C} = \frac{1}{n-1}\tilde{\mathbf{X}}^\top \tilde{\mathbf{X}}\) is the covariance matrix.
Written out from first principles, the scalar projection of each point \(\tilde{\mathbf{x}}_i\) onto \(\mathbf{v}\) is \(z_i = \mathbf{v}^\top \tilde{\mathbf{x}}_i\), and since the data is already centered the mean projection is zero, so:
\[\text{Var}(\mathbf{v}) = \frac{1}{n-1} \sum_{i=1}^{n} z_i^2 = \frac{1}{n-1} \sum_{i=1}^{n} (\mathbf{v}^\top \tilde{\mathbf{x}}_i)^2 = \frac{1}{n-1} \sum_{i=1}^{n} \mathbf{v}^\top \tilde{\mathbf{x}}_i \tilde{\mathbf{x}}_i^\top \mathbf{v} = \mathbf{v}^\top \!\left(\frac{1}{n-1}\sum_{i=1}^n \tilde{\mathbf{x}}_i \tilde{\mathbf{x}}_i^\top\right) \mathbf{v} = \mathbf{v}^\top \mathbf{C}\, \mathbf{v}\]
If \(\mathbf{v}\) is not a unit vector, the normalised version is:
\[\text{Var}(\mathbf{v}) = \frac{\mathbf{v}^\top \mathbf{C}\, \mathbf{v}}{\mathbf{v}^\top \mathbf{v}}\]
This is the Rayleigh quotient of \(\mathbf{C}\). Its maximum over all \(\mathbf{v}\) is \(\lambda_1\) (the largest eigenvalue), achieved when \(\mathbf{v} = \mathbf{v}_1\) (the corresponding eigenvector) — which is exactly what PCA finds.